-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Always inline InterpCx::layout_of after perf regression #143334
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
That's not quite the right command -- and indeed you won't have permission. @bors2 try |
Always inline InterpCx::layout_of after perf regression Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (`ctfe-stress-5`) and it does seem to be faster. I further tried adding `#[inline(always)]` to `compiler/rustc_middle/src/ty/layout.rs` in `layout_of()` under `LayoutOfHelpers` but that didn't change the benchmark results at all. `@rust-timer` build e7e3c9e (I'm not sure I have permission to do the above) r? `@RalfJung` <!-- homu-ignore:start --> <!-- If this PR is related to an unstable feature or an otherwise tracked effort, please link to the relevant tracking issue here. If you don't know of a related tracking issue or there are none, feel free to ignore this. This PR will get automatically assigned to a reviewer. In case you would like a specific user to review your work, you can assign it to them by using r? <reviewer name> --> <!-- homu-ignore:end -->
Argh... |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (1ab17c0): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary 2.3%, secondary 1.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -0.6%, secondary 0.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 462.452s -> 461.112s (-0.29%) |
So, uh, somehow this actually makes things even worse...? |
It seems potentially non-trivial to optimize away the drop glue for this type ENTERED_TRACING_SPAN = ();
fn enter_tracing_span(tracing::Span) -> Self::ENTERED_TRACING_SPAN { () } The macro then simply expands to |
@rustbot author |
Reminder, once the PR becomes ready for a review, use |
I couldn't use this directly as "associated type defaults are unstable" and, most importantly, since trait implementations would be able to override fn enter_trace_span(_span: tracing::Span) -> impl EnteredTraceSpan { () }
trait EnteredTraceSpan {}
impl EnteredTraceSpan for () {}
impl EnteredTraceSpan for tracing::span::EnteredSpan {} I again ran a very quick benchmark and again the latest commit seems slightly faster than the one before on my device, but it obviously shouldn't be trusted, so please run another full benchmark. @rustbot ready |
Hopefully this will make tracing calls be optimized out properly when tracing is disabled
Ah, fair.
Interesting, I would have just removed the default. ;) |
@bors2 try |
This comment has been minimized.
This comment has been minimized.
Always inline InterpCx::layout_of after perf regression Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (`ctfe-stress-5`) and it does seem to be faster. I further tried adding `#[inline(always)]` to `compiler/rustc_middle/src/ty/layout.rs` in `layout_of()` under `LayoutOfHelpers` but that didn't change the benchmark results at all. `@rust-timer` build e7e3c9e (I'm not sure I have permission to do the above) r? `@RalfJung` <!-- homu-ignore:start --> <!-- If this PR is related to an unstable feature or an otherwise tracked effort, please link to the relevant tracking issue here. If you don't know of a related tracking issue or there are none, feel free to ignore this. This PR will get automatically assigned to a reviewer. In case you would like a specific user to review your work, you can assign it to them by using r? <reviewer name> --> <!-- homu-ignore:end -->
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (ce51ffb): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -3.9%, secondary -0.6%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (secondary -1.9%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -1.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 460.935s -> 461.093s (0.03%) |
Fun, so that does seem to help for ctfe-stress, but other benchmarks don't like it, and it doesn't fully negate the effect of #143334. Can you try adding The only other thing I can think of is an entirely different, closure-based API. But given that the regression only affects secondary benchmarks / stress tests, I don't think that's worth it. |
I added inline(always) to the machine hook and also opened #143520 as an alternative using a closure (I'm not sure it's what you had in mind but given the simple change I wanted to try it anyway). Could you run the benchmarks on both? Thanks! |
Fix perf regression caused by tracing See #143334, this is another alternative that may be worth benchmarking as suggested in #143334 (comment). r? `@RalfJung` <!-- homu-ignore:start --> <!-- If this PR is related to an unstable feature or an otherwise tracked effort, please link to the relevant tracking issue here. If you don't know of a related tracking issue or there are none, feel free to ignore this. This PR will get automatically assigned to a reviewer. In case you would like a specific user to review your work, you can assign it to them by using r? <reviewer name> --> <!-- homu-ignore:end -->
@bors2 try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Always inline InterpCx::layout_of after perf regression Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (`ctfe-stress-5`) and it does seem to be faster. I further tried adding `#[inline(always)]` to `compiler/rustc_middle/src/ty/layout.rs` in `layout_of()` under `LayoutOfHelpers` but that didn't change the benchmark results at all. `@rust-timer` build e7e3c9e (I'm not sure I have permission to do the above) r? `@RalfJung` <!-- homu-ignore:start --> <!-- If this PR is related to an unstable feature or an otherwise tracked effort, please link to the relevant tracking issue here. If you don't know of a related tracking issue or there are none, feel free to ignore this. This PR will get automatically assigned to a reviewer. In case you would like a specific user to review your work, you can assign it to them by using r? <reviewer name> --> <!-- homu-ignore:end -->
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (e5441bd): comparison URL. Overall result: ❌✅ regressions and improvements - please read the text belowBenchmarking this pull request means it may be perf-sensitive – we'll automatically label it not fit for rolling up. You can override this, but we strongly advise not to, due to possible changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please do so in sufficient writing along with @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -1.6%, secondary -2.0%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -1.2%, secondary 3.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 460.756s -> 461.178s (0.09%) |
Pretty much the same as the previous one. Should we land this or do you want to try the other closure variant in the other PR? |
Followup to #142721 to fix the performance regression. I ran one quick benchmark locally (
ctfe-stress-5
) and it does seem to be faster. I further tried adding#[inline(always)]
tocompiler/rustc_middle/src/ty/layout.rs
inlayout_of()
underLayoutOfHelpers
but that didn't change the benchmark results at all.@rust-timer build e7e3c9e
(I'm not sure I have permission to do the above)
r? @RalfJung